gendered language
Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models
Saeed, Muhammed, Raza, Shaina, Vayani, Ashmal, Abdul-Mageed, Muhammad, Emami, Ali, Shehata, Shady
Research on bias in Text-to-Image (T2I) models has primarily focused on demographic representation and stereotypical attributes, overlooking a fundamental question: how does grammatical gender influence visual representation across languages? We introduce a cross-linguistic benchmark examining words where grammatical gender contradicts stereotypical gender associations (e.g., ``une sentinelle'' - grammatically feminine in French but referring to the stereotypically masculine concept ``guard''). Our dataset spans five gendered languages (French, Spanish, German, Italian, Russian) and two gender-neutral control languages (English, Chinese), comprising 800 unique prompts that generated 28,800 images across three state-of-the-art T2I models. Our analysis reveals that grammatical gender dramatically influences image generation: masculine grammatical markers increase male representation to 73% on average (compared to 22% with gender-neutral English), while feminine grammatical markers increase female representation to 38% (compared to 28% in English). These effects vary systematically by language resource availability and model architecture, with high-resource languages showing stronger effects. Our findings establish that language structure itself, not just content, shapes AI-generated visual outputs, introducing a new dimension for understanding bias and fairness in multilingual, multimodal systems.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Middle East (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Leveraging Large Language Models to Measure Gender Bias in Gendered Languages
Derner, Erik, de la Fuente, Sara Sansalvador, Gutiérrez, Yoan, Moreda, Paloma, Oliver, Nuria
Gender bias in text corpora used in various natural language processing (NLP) contexts, such as for training large language models (LLMs), can lead to the perpetuation and amplification of societal inequalities. This is particularly pronounced in gendered languages like Spanish or French, where grammatical structures inherently encode gender, making the bias analysis more challenging. Existing methods designed for English are inadequate for this task due to the intrinsic linguistic differences between English and gendered languages. This paper introduces a novel methodology that leverages the contextual understanding capabilities of LLMs to quantitatively analyze gender representation in Spanish corpora. By utilizing LLMs to identify and classify gendered nouns and pronouns in relation to their reference to human entities, our approach provides a nuanced analysis of gender biases. We empirically validate our method on four widely-used benchmark datasets, uncovering significant gender disparities with a male-to-female ratio ranging from 4:1 to 6:1. These findings demonstrate the value of our methodology for bias quantification in gendered languages and suggest its application in NLP, contributing to the development of more equitable language technologies.
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Gender, names and other mysteries: Towards the ambiguous for gender-inclusive translation
Saunders, Danielle, Olsen, Katrina
The vast majority of work on gender in MT focuses on 'unambiguous' inputs, where gender markers in the source language are expected to be resolved in the output. Conversely, this paper explores the widespread case where the source sentence lacks explicit gender markers, but the target sentence contains them due to richer grammatical gender. We particularly focus on inputs containing person names. Investigating such sentence pairs casts a new light on research into MT gender bias and its mitigation. We find that many name-gender co-occurrences in MT data are not resolvable with 'unambiguous gender' in the source language, and that gender-ambiguous examples can make up a large proportion of training examples. From this, we discuss potential steps toward gender-inclusive translation which accepts the ambiguity in both gender and translation.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (11 more...)
Transcending the "Male Code": Implicit Masculine Biases in NLP Contexts
Seaborn, Katie, Chandra, Shruti, Fabre, Thibault
Critical scholarship has elevated the problem of gender bias in data sets used to train virtual assistants (VAs). Most work has focused on explicit biases in language, especially against women, girls, femme-identifying people, and genderqueer folk; implicit associations through word embeddings; and limited models of gender and masculinities, especially toxic masculinities, conflation of sex and gender, and a sex/gender binary framing of the masculine as diametric to the feminine. Yet, we must also interrogate how masculinities are "coded" into language and the assumption of "male" as the linguistic default: implicit masculine biases. To this end, we examined two natural language processing (NLP) data sets. We found that when gendered language was present, so were gender biases and especially masculine biases. Moreover, these biases related in nuanced ways to the NLP context. We offer a new dictionary called AVA that covers ambiguous associations between gendered language and the language of VAs.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (13 more...)
- Leisure & Entertainment (1.00)
- Health & Medicine (0.69)
- Law (0.67)
- (4 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
'World of Warcraft: Dragonflight' won't use gendered language in its character generator
World of Warcraft: Dragonflight is joining the ranks of games with more inclusive character generators. Both Wowhead and Polygon note the expansion's new alpha release has dropped gendered language from its character creator. Instead of the male and female options you frequently see in these tools, they're now divided into respective "Body 1" and "Body 2" sections. While they effectively offer the same characteristics as before, you can now build a gender non-conforming adventurer without any awkward wording. Wowhead also found code suggesting that you may get to choose he/him, she/her and they/them pronouns in a future release, which could help other players address your character accordingly.
A Gramulator Analysis of Gendered Language in Cable News Reportage
Wen, Xin (University of Memphis) | McCarthy, Philip Michael (Decooda International) | Strain, Amber Chauncey (University of Memphis)
News reportage is intended to serve the public in terms of nurturing a better understanding of political and societal concerns. But such a goal may be stymied if reporters lack a sufficient understanding of the effect gendered language may have on the conveyance and interpretation of news. To address this issue, we use the Gramulator to conduct an applied natural language processing study of the linguistic and topical features of gendered language in news reportage. Our goal is to offer some insights as to how the choice of language and topics might affect the efficacy of news reportage. Results suggest that current news reportage largely conforms to an established gender divide: Specifically, we find evidence that male reportage is more quantitative and likely to focus on topics such as politics, crime, and the military. By contrast, female reportage is more qualitative, and likely to focus on issues such as home and education. The study is of interest to all current affairs writers (e.g., journalists) because it offers a systematic approach to identifying and assessing the linguistic and topical differences that contribute to gendered language in new reportage.